Skip to content

Conversation

@lmmx
Copy link
Member

@lmmx lmmx commented Nov 7, 2025

🏗️ Beginning setup for tagged enums


Related:

These 2 facet-json issues discuss an equivalent to serde's untagged, which is out of scope for this initial tagged implementation.

This facet issue discussed "defining a dedicated tag attribute" (which was implemented and used in facet-asn1


Development log:

Outline of work (giacometti dev journal entry)

Journal Entry: 2025-01-07 Tagged Enum Deserialization

Current State

  • facet-core Shape stores type_tag: Option<&'static str> field for format-specific type identification
  • facet-macros-emit emits type_tag from #[facet(type_tag = "...")] container attribute into Shape builder (process_enum.rs:434, process_struct.rs:199-205)
  • facet-json deserializer implements Format trait with next() method returning Outcome enum (deserialize.rs:23-121)
  • facet-json tokenizer parses JSON tokens with position tracking and skip_whitespace() method (tokenizer.rs:134-147, 189-384)
  • facet-deserialize StackRunner handles object field matching via object_key_or_object_close() (lib.rs:1133-1299)
  • Enum variant selection uses wip.find_variant() matching key against variant names (lib.rs:1173-1187)
  • Variant effective names account for facet rename rules via PVariant::name (process_enum.rs:113-115)
  • ASN.1 uses type_tag for protocol tag values via Asn1TypeTag parsing (facet-asn1/src/tag.rs:92-208)
  • facet-serialize serialize_iterative drives serialization from Peek values (serialize.rs:33)

Missing

  • Tokenizer peek_object_field() method for non-consuming field value lookup (tokenizer.rs needs new method after parse_literal around line 384)
  • Format::next() logic to detect enum with type_tag when Token::LBrace encountered (deserialize.rs:75 needs type_tag check)
  • Format::next() variant selection via wip.select_nth_variant() when discriminator matches variant name (deserialize.rs:75 needs selection before ObjectStarted return)
  • Import of Type and UserType from facet_core in deserialize.rs for type checking
  • StackRunner::object_key_or_object_close() logic to skip discriminator field when processing enum object fields (lib.rs:1173 needs discriminator check before variant matching)
  • Serializer emission of discriminator field for tagged enums before variant fields (serialize.rs or facet-serialize investigation needed)
  • Tests for tagged enum deserialization with rename rules (facet-json/tests/tagged_enums.rs)
  • Tests for tree-sitter schema structures with nested tagged enums (facet-json/tests/tree_sitter_schema.rs)
  • Documentation of tagged enum usage with type_tag attribute and tree-sitter example (facet-json/README.md)

Implementation plan

Analysis of Current Implementation

Analysis of Current Implementation

facet-core Shape representation

  • Shape::type_tag field exists as Option<&'static str> (used in facet-asn1/src/lib.rs:79-83)
  • Stored via .type_tag(value) builder method in generated Shape construction
  • Container-level attribute, applied to enums and structs
  • ASN.1 uses type_tag to specify protocol tag values (facet-asn1/src/tag.rs:92-208)
  • JSON will reuse type_tag to specify discriminator field name

facet-macros-emit enum handling

  • process_enum.rs emits type_tag on EnumType's Shape if #[facet(type_tag = "...")] present on container (line 434)
  • Variant names stored in Variant::builder().name(token) where token is the effective name after rename rules (line 115)
  • PVariant::name has both raw (IdentOrLiteral) and effective (String) components (used at line 113)
  • Effective name already accounts for #[facet(rename = "...")] attributes
  • No changes needed to macro emit code for this feature

facet-json deserialization

  • deserialize.rs implements Format trait with next() and skip() methods (lines 23-121)
  • Returns Outcome enum variants: Scalar, ObjectStarted, ObjectEnded, ListStarted, ListEnded (lib.rs:92-103)
  • No logic checks Shape::type_tag field
  • No enum variant discrimination based on object field values
  • Tokenizer operates character-by-character with position tracking (tokenizer.rs:189-384)

facet-deserialize framework

  • Generic deserialization in lib.rs:deserialize_wip() drives parsing loop (lines 340-433)
  • StackRunner maintains parsing state with instruction stack (lines 839-905)
  • StackRunner::object_key_or_object_close() handles object field matching (lines 1133-1299)
  • Enum variant selection via wip.find_variant(&key) when key matches variant name (line 1173)
  • NextData carries input, runner state, and work-in-progress value through parsing (lines 226-245)

Target Behavior

Target Behavior

Example code

#[derive(Facet)]
#[facet(type_tag = "type")]
enum Rule {
    #[facet(rename = "REPEAT")]
    Repeat { content: Box<Rule> },
    
    #[facet(rename = "REPEAT1")]
    Repeat1 { content: Box<Rule> },
    
    #[facet(rename = "SYMBOL")]
    Symbol { name: String },
}

Example JSON inputs

{ "type": "REPEAT", "content": { "type": "SYMBOL", "name": "x" } }
{ "type": "REPEAT1", "content": { "type": "SYMBOL", "name": "y" } }
{ "type": "SYMBOL", "name": "z" }

Deserialization behavior

  • When deserializing enum with type_tag = "type", look for object field named "type"
  • Match field value against variant effective names ("REPEAT", "REPEAT1", "SYMBOL")
  • Select matching variant before processing remaining object fields
  • Skip discriminator field when processing object fields (already used for variant selection)
  • Error if discriminator value matches no variant name

Serialization behavior

  • When serializing enum with type_tag = "type", emit discriminator field first
  • Field name from shape.type_tag, value from active variant's effective name
  • Then emit variant's actual fields

Required Behaviour

Required Changes

1. Add peek functionality to Tokenizer

File: facet-json/src/tokenizer.rs

Location: Add new method after parse_literal() (around line 384)

Changes required:

  • Add peek_object_field(&mut self, field_name: &str) -> Option<Spanned<Token<'input>>> method
  • Method saves current position, scans object for matching field name, returns value token, restores position
  • Returns None if not an object, field not found, or parse error

Implementation:

impl<'input> Tokenizer<'input> {
    /// Peek at an object's field value without consuming tokens
    /// Returns None if not an object or field not found
    pub fn peek_object_field(&mut self, field_name: &str) -> Option<Spanned<Token<'input>>> {
        let saved_pos = self.pos;
        
        // Skip whitespace
        self.skip_whitespace();
        
        // Expect LBrace
        match self.next_token() {
            Ok(token) if matches!(token.node, Token::LBrace) => {},
            _ => {
                self.pos = saved_pos;
                return None;
            }
        }
        
        // Scan for field_name key
        loop {
            self.skip_whitespace();
            
            let key_token = match self.next_token() {
                Ok(t) => t,
                Err(_) => {
                    self.pos = saved_pos;
                    return None;
                }
            };
            
            match key_token.node {
                Token::String(ref key) if key.as_ref() == field_name => {
                    // Skip colon
                    self.skip_whitespace();
                    match self.next_token() {
                        Ok(token) if matches!(token.node, Token::Colon) => {},
                        _ => {
                            self.pos = saved_pos;
                            return None;
                        }
                    }
                    // Get value token
                    self.skip_whitespace();
                    let value = match self.next_token() {
                        Ok(v) => v,
                        Err(_) => {
                            self.pos = saved_pos;
                            return None;
                        }
                    };
                    self.pos = saved_pos;
                    return Some(value);
                }
                Token::String(_) => {
                    // Skip this key's colon and value
                    self.skip_whitespace();
                    match self.next_token() {
                        Ok(token) if matches!(token.node, Token::Colon) => {},
                        _ => {
                            self.pos = saved_pos;
                            return None;
                        }
                    }
                    self.skip_whitespace();
                    // Skip the value
                    match self.next_token() {
                        Ok(_) => {},
                        Err(_) => {
                            self.pos = saved_pos;
                            return None;
                        }
                    }
                    // Check for comma or end
                    self.skip_whitespace();
                    match self.next_token() {
                        Ok(token) if matches!(token.node, Token::Comma) => continue,
                        Ok(token) if matches!(token.node, Token::RBrace) => break,
                        _ => break,
                    }
                }
                Token::RBrace => break,
                _ => break,
            }
        }
        
        self.pos = saved_pos;
        None
    }
}

2. Modify Format::next() to handle tagged enums

File: facet-json/src/deserialize.rs

Location: Token::LBrace handler in next() method (around line 75)

Changes required:

  • Import Type and UserType from facet_core at top of file
  • When Token::LBrace encountered, check if nd.wip.shape() is enum with type_tag
  • If yes, create new tokenizer on input slice starting at nd.start()
  • Call peek_object_field() with discriminator field name
  • Match returned string value against variant effective names
  • Call nd.wip.select_nth_variant(index) when match found
  • Return error using existing DeserErrorKind::NoSuchVariant when no match
  • Continue with normal ObjectStarted return

Implementation:

// Add to imports at top of file:
use facet_core::{Type, UserType};

// In next() method, replace Token::LBrace handler (around line 75):
Token::LBrace => {
    // Check if we're deserializing into an enum with type_tag
    let wip_shape = nd.wip.shape();
    if let (Some(discriminator_field), Type::User(UserType::Enum(et))) = 
        (wip_shape.type_tag, wip_shape.ty) 
    {
        // Peek at the discriminator field value
        let input_slice = &nd.input()[nd.start()..];
        let mut peek_tokenizer = Tokenizer::new(input_slice);
        
        if let Some(tag_token) = peek_tokenizer.peek_object_field(discriminator_field) {
            if let Token::String(tag_value) = tag_token.node {
                // Find matching variant by effective name
                let mut found_variant = None;
                for (idx, variant) in et.variants.iter().enumerate() {
                    if variant.name == tag_value.as_ref() {
                        found_variant = Some(idx);
                        break;
                    }
                }
                
                match found_variant {
                    Some(variant_index) => {
                        // Select the variant before processing object fields
                        nd.wip.select_nth_variant(variant_index)
                            .map_err(|e| DeserErrorKind::ReflectError(e).with_span(span))?;
                    }
                    None => {
                        return (nd, Err(DeserErrorKind::NoSuchVariant {
                            name: tag_value.into_owned(),
                            enum_shape: wip_shape,
                        }.with_span(span)));
                    }
                }
            }
        }
    }
    
    // Continue with normal ObjectStarted handling
    Ok(Spanned {
        node: Outcome::ObjectStarted,
        span,
    })
}

3. Skip discriminator field during object deserialization

File: facet-deserialize/src/lib.rs

Location: StackRunner::object_key_or_object_close() enum handling (around line 1173)

Changes required:

  • When processing enum object keys in Type::User(UserType::Enum(_ed)) branch
  • Before variant matching logic, check if key matches shape.type_tag
  • If discriminator field encountered, set ignore = true and skip processing
  • Continue with existing variant matching logic for other keys

Implementation:

// In object_key_or_object_close(), modify Type::User(UserType::Enum(_ed)) case (around line 1173):
Type::User(UserType::Enum(_ed)) => {
    // Check if this key is the discriminator field
    if let Some(discriminator_field) = shape.type_tag {
        if key.as_ref() == discriminator_field {
            // This is the discriminator field - skip it
            trace!("Skipping discriminator field '{}'", discriminator_field);
            ignore = true;
            self.stack.push(Instruction::ObjectKeyOrObjectClose);
            self.stack.push(Instruction::SkipValue);
            return Ok(wip);
        }
    }
    
    match wip.find_variant(&key) {
        // ... existing variant matching logic unchanged ...
    }
}

4. Emit discriminator field during serialization

File: facet-json/src/serialize.rs

Location: Requires examining facet-serialize's serialize_iterative usage

Investigation needed: The serialize.rs file uses facet_serialize::serialize_iterative() (line 33) which takes a Peek and drives serialization. Need to examine how serialize_iterative calls the serializer for enum variants.

Expected location: When serialize_iterative processes an enum variant's fields, before calling serialize_field_name() for the first actual field.

Changes required:

  • Before serializing enum variant fields, check if enum shape has type_tag
  • If present, emit discriminator field first via serializer.serialize_field_name(discriminator_field)? then serializer.serialize_str(variant.name)?
  • Then continue with normal field serialization

Note: Exact implementation depends on facet-serialize's serialize_iterative structure. May require changes in facet-serialize crate itself if serialize_iterative doesn't expose hooks for pre-field emission. If changes needed in facet-serialize, document as separate task.

Fallback approach if serialize_iterative modification not feasible: Override serialization for tagged enums by implementing custom logic in JsonSerializer that checks for enum with type_tag before calling serialize_iterative.

Testing Requirements

Testing Requirements

Unit tests in facet-json

File: facet-json/tests/tagged_enums.rs (new file)

Test cases:

use facet::Facet;

#[test]
fn deserialize_tagged_enum_basic() {
    #[derive(Facet, Debug, PartialEq)]
    #[facet(type_tag = "type")]
    enum Rule {
        #[facet(rename = "REPEAT")]
        Repeat { content: String },
        #[facet(rename = "REPEAT1")]
        Repeat1 { content: String },
    }
    
    let json = r#"{"type": "REPEAT", "content": "test"}"#;
    let rule: Rule = facet_json::from_str(json).unwrap();
    assert_eq!(rule, Rule::Repeat { content: "test".into() });
    
    let json2 = r#"{"type": "REPEAT1", "content": "test"}"#;
    let rule2: Rule = facet_json::from_str(json2).unwrap();
    assert_eq!(rule2, Rule::Repeat1 { content: "test".into() });
}

#[test]
fn deserialize_tagged_enum_field_order_independent() {
    #[derive(Facet, Debug, PartialEq)]
    #[facet(type_tag = "type")]
    enum Rule {
        #[facet(rename = "REPEAT")]
        Repeat { content: String },
    }
    
    // Discriminator field after content field
    let json = r#"{"content": "test", "type": "REPEAT"}"#;
    let rule: Rule = facet_json::from_str(json).unwrap();
    assert_eq!(rule, Rule::Repeat { content: "test".into() });
}

#[test]
fn error_on_unknown_variant_tag() {
    #[derive(Facet, Debug)]
    #[facet(type_tag = "type")]
    enum Rule {
        #[facet(rename = "REPEAT")]
        Repeat { content: String },
    }
    
    let json = r#"{"type": "UNKNOWN", "content": "test"}"#;
    let result: Result<Rule, _> = facet_json::from_str(json);
    assert!(result.is_err());
    let err_msg = result.unwrap_err().to_string();
    assert!(err_msg.contains("UNKNOWN"));
}

#[test]
fn deserialize_nested_tagged_enums() {
    #[derive(Facet, Debug, PartialEq)]
    #[facet(type_tag = "type")]
    enum Rule {
        #[facet(rename = "REPEAT")]
        Repeat { content: Box<Rule> },
        #[facet(rename = "SYMBOL")]
        Symbol { name: String },
    }
    
    let json = r#"{"type": "REPEAT", "content": {"type": "SYMBOL", "name": "x"}}"#;
    let rule: Rule = facet_json::from_str(json).unwrap();
    match rule {
        Rule::Repeat { content } => {
            match *content {
                Rule::Symbol { name } => assert_eq!(name, "x"),
                _ => panic!("Expected Symbol variant"),
            }
        }
        _ => panic!("Expected Repeat variant"),
    }
}

#[test]
fn serialize_tagged_enum() {
    #[derive(Facet, Debug, PartialEq)]
    #[facet(type_tag = "type")]
    enum Rule {
        #[facet(rename = "REPEAT")]
        Repeat { content: String },
    }
    
    let rule = Rule::Repeat { content: "test".into() };
    let json = facet_json::to_string(&rule);
    
    // Parse back to verify round-trip
    let parsed: Rule = facet_json::from_str(&json).unwrap();
    assert_eq!(parsed, rule);
    
    // Verify discriminator field present
    assert!(json.contains("\"type\""));
    assert!(json.contains("\"REPEAT\""));
}

Integration test

File: facet-json/tests/tree_sitter_schema.rs (new file)

Test case:

use facet::Facet;

#[derive(Facet, Debug, PartialEq)]
#[facet(type_tag = "type")]
enum Rule {
    #[facet(rename = "REPEAT")]
    Repeat { content: Box<Rule> },
    
    #[facet(rename = "REPEAT1")]
    Repeat1 { content: Box<Rule> },
    
    #[facet(rename = "SYMBOL")]
    Symbol { name: String },
}

#[test]
fn deserialize_tree_sitter_rules() {
    let json1 = r#"{"type": "REPEAT", "content": {"type": "SYMBOL", "name": "x"}}"#;
    let rule1: Rule = facet_json::from_str(json1).unwrap();
    match rule1 {
        Rule::Repeat { content } => {
            match *content {
                Rule::Symbol { name } => assert_eq!(name, "x"),
                _ => panic!("Expected Symbol variant"),
            }
        }
        _ => panic!("Expected Repeat variant"),
    }
    
    let json2 = r#"{"type": "REPEAT1", "content": {"type": "SYMBOL", "name": "y"}}"#;
    let rule2: Rule = facet_json::from_str(json2).unwrap();
    match rule2 {
        Rule::Repeat1 { content } => {
            match *content {
                Rule::Symbol { name } => assert_eq!(name, "y"),
                _ => panic!("Expected Symbol variant"),
            }
        }
        _ => panic!("Expected Repeat1 variant"),
    }
    
    let json3 = r#"{"type": "SYMBOL", "name": "z"}"#;
    let rule3: Rule = facet_json::from_str(json3).unwrap();
    match rule3 {
        Rule::Symbol { name } => assert_eq!(name, "z"),
        _ => panic!("Expected Symbol variant"),
    }
}

#[test]
fn serialize_tree_sitter_rules() {
    let rule1 = Rule::Repeat { 
        content: Box::new(Rule::Symbol { name: "x".into() }) 
    };
    let json1 = facet_json::to_string(&rule1);
    let parsed1: Rule = facet_json::from_str(&json1).unwrap();
    assert_eq!(parsed1, rule1);
    
    let rule2 = Rule::Symbol { name: "z".into() };
    let json2 = facet_json::to_string(&rule2);
    let parsed2: Rule = facet_json::from_str(&json2).unwrap();
    assert_eq!(parsed2, rule2);
}

Documentation Requirements

Documentation Requirements

README.md updates

File: facet-json/README.md

Location: After existing content, add new section

Content:

## Tagged Enum Deserialization

Facet-json supports internally-tagged enum deserialization using the `type_tag` attribute:

```rust
use facet::Facet;

#[derive(Facet)]
#[facet(type_tag = "type")]
enum Rule {
    #[facet(rename = "REPEAT")]
    Repeat { content: Box<Rule> },
    
    #[facet(rename = "REPEAT1")]
    Repeat1 { content: Box<Rule> },
    
    #[facet(rename = "SYMBOL")]
    Symbol { name: String },
}

The type_tag attribute specifies which JSON object field contains the variant discriminator. Variant names (after #[facet(rename = "...")] rules) are matched against discriminator values.

Example JSON:

{"type": "REPEAT", "content": {"type": "SYMBOL", "name": "x"}}
{"type": "SYMBOL", "name": "z"}

This enables deserialization of schemas like tree-sitter's grammar.json where different rule types share similar structure but are distinguished by a discriminator field.

Migration Path

Migration Path

Backwards compatibility

  • No breaking changes
  • Existing enums without #[facet(type_tag = "...")] continue working unchanged
  • New tagged enum behavior only activates when type_tag attribute present on enum container
  • ASN.1 usage of type_tag unaffected (different interpretation per format)

Deprecations

  • None required

Known Limitations

Known Limitations

Not implemented in MVP

Workarounds

  • For adjacently-tagged: manually wrap in struct with two fields
  • For untagged: use struct with flatten attributes
  • For transparent enum variants: defer to future enhancement

#[derive(Facet, Debug, PartialEq)]
#[facet(type_tag = "type")]
#[repr(u8)]
enum Rule {

Check warning

Code scanning / clippy

variant Repeat1 is never constructed Warning test

variant Repeat1 is never constructed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will resolve when the test is fixed

Repeat { content: Box<Rule> },

#[facet(rename = "REPEAT1")]
Repeat1 { content: Box<Rule> },

Check warning

Code scanning / clippy

variant Repeat1 is never constructed Warning test

variant Repeat1 is never constructed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will resolve when the test is fixed

@lmmx lmmx changed the title test(tdd): failing test for internal tagged enum Tagged enum Nov 7, 2025
@lmmx lmmx linked an issue Nov 7, 2025 that may be closed by this pull request
@lmmx lmmx self-assigned this Nov 7, 2025
@lmmx lmmx marked this pull request as draft November 7, 2025 17:12
@lmmx lmmx added enhancement ✨ enhancement New feature or request 📜 derive Related to the derive macro and removed enhancement labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📜 derive Related to the derive macro ✨ enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

json: Support representing enum variants as tags

2 participants